12 research outputs found

    Speech Enhancement Using Bayesian Estimators of the Perceptually-Motivated Short-Time Spectral Amplitude (STSA) with Chi Speech Priors

    Get PDF
    In this paper, the authors propose new perceptually-motivated Weighted Euclidean (WE) and Weighted Cosh (WCOSH) estimators that utilize more appropriate Chi statistical models for the speech prior with Gaussian statistical models for the noise likelihood. Whereas the perceptually-motivated WE and WCOSH cost functions emphasized spectral valleys rather than spectral peaks (formants) and indirectly accounted for auditory masking effects, the incorporation of the Chi distribution statistical models demonstrated distinct improvement over the Rayleigh statistical models for the speech prior. The estimators incorporate both weighting law and shape parameters on the cost functions and distributions. Performance is evaluated in terms of the Segmental Signal-to-Noise Ratio (SSNR), Perceptual Evaluation of Speech Quality (PESQ), and Signal-to-Noise Ratio (SNR) Loss objective quality measures to determine the amount of noise reduction along with overall speech quality and speech intelligibility improvement. Based on experimental results across three different input SNRs and eight unique noises along with various weighting law and shape parameters, the two general, less-complicated, closed-form derived solution estimators of WE and WCOSH with Chi speech priors provide significant gains in noise reduction and noticeable gains in overall speech quality and speech intelligibility improvements over the baseline WE and WCOSH with the standard Rayleigh speech priors. Overall, the goal of the work is to capitalize on the mutual benefits of the WE and WCOSH cost functions and Chi distributions for the speech prior to improvement enhancement

    Optimal Distributed Microphone Phase Estimation

    Get PDF
    This paper presents a minimum mean-square error spectral phase estimator for speech enhancement in the distributed multiple microphone scenario. The estimator uses Gaussian models for both the speech and noise priors under the assumption of a diffuse incoherent noise field representing ambient noise in a widely dispersed microphone configuration. Experiments demonstrate significant benefits of using the optimal multichannel phase estimator as compared to the noisy phase of a reference channel

    Distributed Multichannel Speech Enhancement Based on Perceptually-Motivated Bayesian Estimators of the Spectral Amplitude

    Get PDF
    In this study, the authors propose multichannel weighted Euclidean (WE) and weighted cosh (WCOSH) cost function estimators for speech enhancement in the distributed microphone scenario. The goal of the work is to illustrate the advantages of utilising additional microphones and modified cost functions for improving signal-to-noise ratio (SNR) and segmental SNR (SSNR) along with log-likelihood ratio (LLR) and perceptual evaluation of speech quality (PESQ) objective metrics over the corresponding single-channel baseline estimators. As with their single-channel counterparts, the perceptually-motivated multichannel WE and WCOSH estimators are functions of a weighting law parameter, which influences attention of the noisy spectral amplitude through a spectral gain function, emphasises spectral peak (formant) information, and accounts for auditory masking effects. Based on the simulation results, the multichannel WE and WCOSH cost function estimators produced gains in SSNR improvement, LLR output and PESQ output over the single-channel baseline results and unweighted cost functions with the best improvements occurring with negative values of the weighting law parameter across all input SNR levels and noise types

    Distributed Multichannel Speech Enhancement with Minimum Mean-square Error Short-time Spectral Amplitude, Log-spectral Amplitude, and Spectral Phase Estimation

    Get PDF
    In this paper, the authors present optimal multichannel frequency domain estimators for minimum mean-square error (MMSE) short-time spectral amplitude (STSA), log-spectral amplitude (LSA), and spectral phase estimation in a widely distributed microphone configuration. The estimators utilize Rayleigh and Gaussian statistical models for the speech prior and noise likelihood with a diffuse noise field for the surrounding environment. Based on the Signal-to-Noise Ratio (SNR) and Segmental Signal-to-Noise Ratio (SSNR) along with the Log-Likelihood Ratio (LLR) and Perceptual Evaluation of Speech Quality (PESQ) as objective metrics, the multichannel LSA estimator decreases background noise and speech distortion and increases speech quality compared to the baseline single channel STSA and LSA estimators, where the optimal multichannel spectral phase estimator serves as a significant quantity to the improvements, and demonstrates robustness due to time alignment and attenuation factor estimation. Overall, the optimal distributed microphone spectral estimators show strong results in noisy environments with application to many consumer, industrial, and military products

    Deterministic Seirs Epidemic Model for Modeling Vital Dynamics, Vaccinations, and Temporary Immunity

    No full text
    In this paper, the author proposes a new SEIRS model that generalizes several classical deterministic epidemic models (e.g., SIR and SIS and SEIR and SEIRS) involving the relationships between the susceptible S, exposed E, infected I, and recovered R individuals for understanding the proliferation of infectious diseases. As a way to incorporate the most important features of the previous models under the assumption of homogeneous mixing (mass-action principle) of the individuals in the population N, the SEIRS model utilizes vital dynamics with unequal birth and death rates, vaccinations for newborns and non-newborns, and temporary immunity. In order to determine the equilibrium points, namely the disease-free and endemic equilibrium points, and study their local stability behaviors, the SEIRS model is rescaled with the total time-varying population and analyzed according to its epidemic condition R0 for two cases of no epidemic (R0 ≤ 1) and epidemic (R0 > 1) using the time-series and phase portraits of the susceptible s, exposed e, infected i, and recovered r individuals. Based on the experimental results using a set of arbitrarily-defined parameters for horizontal transmission of the infectious diseases, the proportional population of the SEIRS model consisted primarily of the recovered r (0.7–0.9) individuals and susceptible s (0.0–0.1) individuals (epidemic) and recovered r (0.9) individuals with only a small proportional population for the susceptible s (0.1) individuals (no epidemic). Overall, the initial conditions for the susceptible s, exposed e, infected i, and recovered r individuals reached the corresponding equilibrium point for local stability: no epidemic (DFE X ¯ D F E ) and epidemic (EE X ¯ E E )

    Distributed multichannel processing for signal enhancement

    No full text
    The goal of this work is to generalize speech enhancement methods from single channel microphones, dual channel microphones, and microphone arrays to distributed microphones. The focus has been on developing and implementing robust and optimal time domain and frequency domain estimators for estimating the true source signal in this configuration and measuring the performance improvement with both objective (e.g., signal-to-noise ratios) and subjective (e.g., listening tests) metrics. Statistical estimation techniques (e.g., minimum mean-square error or MMSE) with Gaussian speech priors and Gaussian noise likelihoods have been used to derive solutions for five basic classes of estimators: (1) time domain; (2) spectral amplitude; (3) perceptually-motivated spectral amplitude; (4) spectral phase; and (5) complex real and imaginary spectral component. Experimental work using different true source signal attenuation factors (e.g., unity, linear, and logarithmic) demonstrates significant gains in segmental signal-to-noise ratio (SSNR) with an increase in the number of microphones. Of particular importance is the inclusion of the optimal MMSE spectral phase estimator to the spectral amplitude estimators. Overall, the statistical estimators show tremendous promise for distributed microphone speech enhancement of noisy acoustic signals with application to many consumer, industrial, and military products under severely noisy environments

    Improvements of the Beta-Order Minimum Mean-Square Error (MMSE) Spectral Amplitude Estimator using Chi Priors

    No full text
    In this paper, the authors propose the Beta-Order Minimum Mean-Square Error (MMSE) Spectral Amplitude estimator with Chi statistical models for the speech priors. The new estimator incorporates both a shape parameter on the distribution and cost function parameter. The performance of the MMSE Beta-Order Spectral Amplitude estimator with Chi speech prior is evaluated using the Segmental Signal-to- Noise Ratio (SSNR) and Perceptual Evaluation of Speech Quality (PESQ) objective quality measures. From the experimental results, the new estimator provides gains of 0-3 dB and 0-0.3 in SSNR and PESQ improvements over the corresponding MMSE Beta-Order MMSE Spectral Amplitude estimator with the standard Rayleigh statistical models for the speech prior

    Automatic Song-Type Classification and Speaker Identification of Norwegian Ortolan Bunting \u3cem\u3eEmberiza Hortulana\u3c/em\u3e Vocalizations

    No full text
    This paper presents an approach to song-type classification and speaker identification of Norwegian Ortolan Bunting (Emberiza Hortulana) vocalizations using traditional human speech processing methods. Hidden Markov models (HMMs) are used for both tasks, with features including mel-frequency cepstral coefficients (MFCCs), log energy, and delta (velocity) and delta-delta (acceleration) coefficients. Vocalizations were tested using leave-one-out cross-validation. Classification accuracy for 5 song-types is 92.4%, dropping to 63.6% as the number and similarity of the songs increases. Song-type dependent speaker identification rates peak at 98.7%, with typical accuracies of 80-95% and a low end at 76.2% as the number of speakers increases. These experiments fit into a larger framework of research working towards methods for acoustic censusing of endangered species populations and more automated bioacoustic analysis methods

    Multichannel Speech Recognition Using Distributed Microphone Signal Fusion Strategies

    No full text
    Multichannel fusion strategies are presented for the distributed microphone recognition environment, for the task of song-type recognition in a multichannel songbird dataset. The signals are first fused together based on various heuristics, including their amplitudes, variances, physical distance, or squared distance, before passing the enhanced single-channel signal into the speech recognition system. The intensity-weighted fusion strategy achieved the highest overall recognition accuracy of 94.4%. By combining the noisy distributed microphone signals in an intelligent way that is proportional to the information contained in the signals, speech recognition systems can achieve higher recognition accuracies
    corecore